The Annals of Applied Probability 

2007, Vol. 17, No. 1, 67-80 

DOI: 10.1214/105051606000000600 

© Institute of Mathematical Statistics, 2007 

A FLEMING VIOT PROCESS AND BAYESIAN 
NONPARAMETRICS 

By Stephen G. Walker, Spyridon J. Hatjispyros 

AND THEODOROS NlCOLERIS 

University of Kent, University of the Aegean and University of the Aegean 

This paper provides a construction of a Fleming- Viot measure 
valued diffusion process, for which the transition function is known, 
by extending recent ideas of the Gibbs sampler based Markov pro- 
cesses. In particular, we concentrate on the Chapman-Kolmogorov 
consistency conditions which allows a simple derivation of such a 
Fleming- Viot process, once a key and apparently new combinatorial 
result for Polya-urn sequences has been established. 

1. Introduction. The Fleming- Viot process, introduced by Fleming and 
Viot [6], is a measure valued diffusion process. The stationary distribution 
of the process is II, where II is the distribution of a random measure /i, on 
some space S, and \i can be obtained via 



oo 

(i-i) K-) = Z>M 

i=l 



where p\ > pi > ■ ■ ■ have the Poisson-Dirichlet distribution [8] and V\ , Vi, . . . 
are independent and identically distributed from v$, and independent of the 
Pi. Such a random measure is also known as a Dirichlet process [5] and has 
been of great importance to Bayesian nonparametric methods. To denote 
the dependence on (6,1/0), we will use the notation 11(9 i/q). 

Ethier and Griffiths [4] provide the transition function for a particular 
Fleming- Viot process. Let d n (t) = P(Dt = n), where Dt is a death process, 
Dq = 00 a.s., and with rate A n = ^n(n — 1 + 9) for some 9 > 0. Tavare [11], 
for example, computed that, for n = 1, 2, . . . , 
00 

(1.2) d n (t) = ]T (-l) m - n C(m, n)(9 + n^^mr 1 ^^ 
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where 



and 



Also, 
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7 ro , t>fl = (2m - 1 + 6)e~ Xmt 

oo 

do(t) = l-£(-ir- 1 (m _i)mr 1 7l 



m=l 



C(m, n) 



ml 



(m — n)\n\ 



and ar m \ = a(a + 1) • • • (a + m — 1) for m = 1, 2, . . . with a(o) = 1- We will also 
use a[ m i = a(a — 1) • • • (a — m + 1) for m = 1, 2, . . . with ap] = 1- We will show 
among other things that this death process is fundamentally connected with 
the general Polya-urn scheme [3]. 
The transition function is given by 

oo r f n \ 

(1.3) P(t, f i,du) = Y l dn(t) Il[dis\ev + Y / SxAv(dX 1 )---fi(dX n ). 

n=0 J \ i=l/ 

It is the intention of this paper to establish a comprehensive construction of 
the process and the transition function using ideas formulated in Bayesian 
nonparametrics relating to the Dirichlet process. The key result, which ap- 
pears new and involves an elegant combinatorial identity, is for sequences of 
Polya-urns. We will also use recent ideas for constructing Markov processes 
using latent variables, outlined in [10]. 

In Section 2 we provide background to the construction of the Fleming- 
Viot process via discrete time processes associated with the Dirichlet process. 
The necessary Chapman-Kolmogorov condition for existence in continuous 
time is examined in Section 3. Section 4 contains technical results and Sec- 
tion 5 concludes the paper with some points of discussion. 



2. Stationary Markov processes using the Dirichlet process. In [10] the 
use of the Dirichlet process for deriving the DAR(l) model was described. 
Consider the joint distribution on S x £P(S), where 8?{S) is the space of 
probability measures on S, given by 

P(d/j,,dX) = fj,(dX)U(dfi\eu Q ). 

In words, [i is chosen from II and, given fj,, X is chosen from [i. By making 
use of both conditional distributions, the conditional distribution for fx being 



P(dti\X)=U(dn\evo + 6 x ), 
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a discrete time Markov process can be constructed on S with transition 
function 

p(x t ,dx t+1 ) = J n(dx t+1 )n(d(i\ev + s Xt ). 



This is of the form 



P{X u dX t+l ) = J P(dX t+1 \fi)P(dfi\X t ). 
A result in [2] gives 

(2.1) n(^ ) = J u(-\eu + s x )u (dx) 

and it is well known that 

(2.2) f t m(d f x\6uo) = u . 

Consequently, it is easy to show that vq is the stationary distribution of the 
process. The process, using properties of the Gibbs sampler, is easily shown 
to be reversible. 

In fact, it follows that 

y f ~ VQi with probability 6/(1 + 6), 
Xt+1 \ = X t , with probability 1/(1 + 6). 

This is the DAR(l) model. 

Alternatively, we could consider the measure valued process on &(S). 
This would have transition function given by 



P(ji,du) = Jn(dv\9v + 5x)K dx )- 



Using (2.1) and (2.2), it is straightforward to show that II is the stationary 
distribution of the process and that it is also reversible. 

Instead of having a single observation from \x, we could consider the joint 
distribution on S n x &(S) given by 

n 

P(dX u . . . , dX n , dfj,) = U(dfi\6iy ) J] n(dXi). 

i=l 

It is well known that the "posterior" or conditional distribution of [i given 
X\ , X n has the form 

ni\6v + f2txX, 
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see [5]. Hence, in this case, the transition function for the measure valued 
process is 

(2.3) P(ji,dv) = Ju^du\euo + J25x^KdX 1 )---KdX n ). 

This is beginning to resemble the transition function for the Fleming— Viot 
process given in (1.3), though obviously for discrete time. To obtain the 
Fleming-Viot process, we need to put the processes we have constructed 
into continuous time. To achieve this, it is necessary to make the number of 
samples n be random. So, using the notation of Ethier and Griffiths in [4], 
we denote by d n (t) the probability that the number of samples being used 
is n for the transition with time t. 

We are now ready to examine the existence of such a process by looking 
at the Chapman-Kolmogorov equations. 

3. Chapman Kolmogorov conditions. We consider the transition from 
Ho to fit and then to fit+s- Thus, we have 

P(dfH +a \Yi,...,Y m ,m) =U^dfi t +s Ouo + ^^Sy^j, 

where Y\, . . . ,Y m are independent and identically distributed from m and 
P(m) =d m (s). Also, 



P(dnt\Xi, . . .,X n ,n) 



where X\, . . . ,X n are independent and identically distributed from /j,q and 
P(n) = d n (t). Now, it is well known that we can integrate out fit and that 

P(Yi, . . . ,Y m \X u ...,X n ,n) = ^{eu + E^J . 

where =2 denotes a distribution associated with the general Polya-urn 
scheme [2, 3]. In terms of sampling, we take the Yi,Yz,... by sampling 
Y\ ~ u n , where 

_ Ovo + Z?=iSx i 
Vn ~ 6 + n 

and then, subsequently, 

P(Y 3 \ Yl , Y^) = g+pK±gfc&. 
V Jl ' ' 3 ' 9 + n+j-l 

Such a sampling scheme, which in the Bayesian nonparametric literature 
is known as the Polya-urn scheme, also appears in mathematical genetics 
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and is connected with the Poisson-Dirichlet process. See, for example, [11]. 
Since the Xj are independent and identically distributed from /iq, to achieve 
the Chapman-Kolmogorov condition, we need to understand how many of 
the Yj's are independent and identically distributed from fj,Q. For there to 
be r of them, we obviously require m>r and n>r. To expand on this, a 
number of the Yj's will be identical to some of the AVs. We are looking for 
the number of distinct indices associated with these AVs. So, for example, 
if we collect up {X2, X4, X2, X\, Xq, X4} from sampling the Yj's, then the 
appropriate number is r = 4; that is, we have the distinct indices {1,2,4,6}. 

Theorem 3.1. Conditionally on n and m, we have that the probability 
mass function for r is given, for r G {0, . . . , min{n, m}}, by 

a, 1 v n [r] (e + r) {m _ r) 

P{r\m,n) = — ^ r-^ -C{m,r), 

{6 + n) {m) 

which can be written in extended form as 

P(r\m, n) = r!C(n, r)C(m, r) ^ ^ . 

V(n+m)V(r) 

The proof is provided in Section 4. Hence, the Chapman-Kolmogorov 
condition becomes 

OO OO Q Q 

(3.1) d r (t + s) = Y,T,r\C(n,r)C(m,r) {n) 7 d m (s)d n (t) 

n=rm=r "(n+m) C/ (r) 

for all s,t > 0. There can be many solutions to this, we will look for those 
within the class of death processes; that is, d n {t) = P(Dt = n). We let the 
rate be A n and so, in particular, we have P(D s+t = n\D t = n) = P(T n > s), 
where T n is an exponential r.v. with parameter X n . 

We have the death process also satisfying Chapman-Kolmogorov and so 

00 

d r (t + s) = P(Dt+s = r\D t = n)d n (t). 
n=r 

Comparing this with the Chapman-Kolmogorov condition in (3.1) for the 
measure valued process, we see that we should have 

6 {r) P(D t+s =r\D t = n) » 9 {m) 

j = 2^ C(m,r) d m {s)0 {n) C{n,r) 

r - m=r "(n+m) 

and so 

~ (m ) e (r) (n-r)l 

(3.2) C(m,r)-^d m (s) = T—P(D t+s = r\D t =n). 
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This needs to be solved. 

We will now show that the d n (t) given in (1.2) is a solution to (3.2). Ethier 
and Griffiths [4] did much the same thing, but from a more complicated 
starting point. Our demonstration, which follows, is now straightforward 
given the result of Theorem 3.1. 

Now 

(3.3) P(D t+h = n\D t = n) = l-X n h + o{h) 
and 

(3.4) P(D t+h = n-l\D t = n) = X n h + o(h). 
By considering 

m - 



and 



m 



m=n-l ( e + m )(n) 

which are part of (3.2) with r = n and r = n — 1, respectively, with the help 
of formulae appearing in [4], page 1585, we can show that the d m (s) given in 
(1.2) satisfies the conditions for the death process. The details are provided 
in Section 4. 

Hence, we see that the complicated nature of the death process proba- 
bilities is solely due to the form of P(r\n,m), which is a property of the 
Polya-urn scheme. For other processes, perhaps with a different choice of II, 
which generates discrete random distribution functions (see Section 5.2), and 
so yield different P(r\n,m), the fundamental equation to solve for obtaining 
a transition function satisfying Chapman-Kolmogorov is to find d m (s) such 
that 

oo 

P(D t+s = r\D t = n)=Y^ P(r\n,m)d m (s). 



m—r 



This appears to be the key. 
4. Technical results. 

Result [A]. We first prove that 



E 



™™ - ■ - -A„ S 



n( 6 + m )(n) 



e 
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Now the left-hand side can be written as 



oo oo 



m=nk=m y J ( n > 



which is equal to 



nl(9 + 2n-l) 

oo fe 

k ^ +1 {k-n)\^ n 
Lemma 4.1. It is that 

k 

^(-l) fc -'C(A;,0(<A + 0( fc -i)=0 
1=0 

for any (ft > and k>l. 

Proof. We will do this by induction and prove the more general result 
that 

^(-l) k ~ l C(k,l)((ft + l) {k _ r) =0 

1=0 

for all r £ {1, . . . , k}. Assume the result is true for all k < K and for r € 
{R,...,K} when k = K. Now 

K 

Y^{-l) K ~ l C{K,l){(ft + l\ K „ R+l) 

1=0 

K 

= <ftJ2(-l) K ~ l C(K, + Cft + 1) {K -R) 
1=0 

K 

+ KJ2(-1) K ~ 1 C(K - 1, 1 - 1)(1 + <f> + 0(jf-i-Bf i) , 
i=i 

which by hypothesis is zero. To complete the proof, note that 

X>i) fe -'c(M) = o 

1=0 

for all k = 1,2, . . . and that the result is true for K = 2. □ 

The result follows from (4.1) by setting I = m — n, and then substituting 
k — n for k. Hence, d m (-) satisfies (3.3). 
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Result [B]. We next show, in the first instance, that 



-ylp + m)^ I X n - A n _i 



Using a result in [4], page 1585, we have 



and so 

1 



H(s) = ~ ^- e - A «-i* + Ce" A " s . 

2 A n — A n _i 



Now # (0) = and so 

c 1 1 



2 X n — X n -i 

leading to the desired result. Now performing some elementary algebra on 
(3.2) with r = n — 1, we obtain 

P(D t+s = n- 1|A = n) = n(n - 1 + 0)H(a), 

which leads to the validity of (3.4). 

We could have proven the validity of (3.3) using this technique as well. If 

mi 



G(-) = ] 

then 



m=n 



as 

Hence, G(s) = Cexp(— X n s) and since G(0) = 1, we have the result. 

Result [C]. Before proving Theorem 3.1, we need to establish the follow- 
ing result (this is apparently a new combinatorial result): 

Lemma 4.2. Let 9 be a positive real. Then for m>r>0 and defining 
01 = 1, 

m—r 

k\C(k + r - 1, k)C(m - r, k)9 {m _ T _ k) = {9 + r) (m _ r) . 

k=0 
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Proof. Let |s(n,/c)| denote the unsigned or absolute Stirling numbers 
of the first kind. Expanding the 6 terms on both sides of this relation, we 
obtain 

m—r m—r—k 

J2k\C(k + r-l,k)C(m-r,k) \s(m - r - k,l)\6 l 

k=0 1=0 



m—r k 

J2 \s(m-r,k)\^2C(k,l)9 

k=0 1=0 



l r k-l 



By changing the order of summation on both sides and collecting up the 
terms, we have 

m—r ( m—r— I \ 

J2\ J2 k\C{k + r-l,k)C{m-r,k)\s(m-r-k,l)\\e l 
i=o I fc=0 J 
m—r ( m—r \ 

= E E^'OK™-^)!^ \e l . 

1=0 I k=l ) 

These are two polynomials in 9 of degree m — r and for them to be equal 
V#, it suffices to establish the equality of the coefficients of the same powers 
of 0; that is, 

m—r— l m—r 

k\C{k + r-l,k)C(m-r,k)\s{m-r-k,l)\ = ^ C{k, l)\s(m-r, k)\r k ~ l , 

k=0 k=l 

for all I = 0, 1, . . . , m — r. 

To show this, is true, we make use of an identity that appears in [1], page 
824, which states that, for positive integers a<b<c, 

c—a 

C(b,a)\s(c,b)\= C(c,j)\s(c-j,a)\\s(j,b-a)\. 

j=b-a 

The right-hand side of the equation can be written as 

m—r 

J2 C(k,l)\s(m-r,k)\r k - 1 
k=l 

m—r m—r— I 

= J2 C(m-r,j)\s(m-r-j,l)\\s(j,k-l)\r k - 1 

k=l j=k—l 

m—r— I j+l 

= C(m-r,j)\s(m-r-j,l)\Y\stii k -l)\ rk ~ l 
j=0 k=l 
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m— r— I j 

= Y c ( m - r J)\ s ( m - r - j> l )\J2\ s tii k )\ rk 

3=0 k=0 
ra—T—l 

= C(m-r,j)\s(m-r-j,l)\r {j) 

3=0 
m—r—l 

= Y ^C{j + r-l,j)C{m-r,j)\s{m-r-j,l)\, 

where we have used the fact that rrj\ = j^-C(j + r — 1, j). This completes the 
proof. □ 

PROOF of Theorem 3.1. The closed form of the joint probability of 

[Y 1 ,...,Y m \X 1 ,...,X n ] 

is given by 

«*» r.|x > .....x.,-ft{^^ t Wtf' W }. 

Assume without loss of generality that X\ , . . . , X r , the first r observations 
from Xx, . . . ,X n , are those that are repeated when we obtain a sample 
Y±, . . . , Y m . Let < Si < m — r, i = 1, 2, . . . , r, and fix a number k such that 
si + S2 + ■ ■ ■ + s r = k, where < k < m — r. 

Here the Sj represent the multiplicity of the Xi, i = 1,2, ...,r, that ap- 
pear in the sample when there are k spaces available for those repetitions. 
So, conditionally on X±, . . . ,X n , we are searching for the probability of the 
simultaneous occurrence of the following events: 



Yl 


= x u 


Yi — Xi-, ■ ■ ■ , Y r — X r , 


Y r+j 


= x u 


l<j<si, 




= x 2 , 


i<i<s 2 , 



r- 



Y r +k+l G % — {X\, . . . ,X n }, 

Y r +k+j G % — {Yr+k+i-, • • • > Yr+k+j-li -^"l) • • • j X n } or 
Y r+k+j e {Y r+k+1 , . . . ,y r+fc+j _i}, 2<j<m-r-k, 
where X is the sample space. This probability is given by 

(si + !)!••• (s r + l)W {m _ r _ k) 

(O + »)(m) 
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Since these events are exchangeable, by taking into consideration the number 
of repetitions of the AYs and the specific order of appearance of the new 
values, which depend on the previous observations, then, for a given k and 
for fixed multiplicities si,...,s r , the probability of them occurring in any 
order is given by 

ml Wsi + l)!---(a P + l)!0 

(m—r—k) 



(si + 1)! • • • (s r + l)!(m -r-k)\) (6 + n) (m) 

_ m ^(m-r-k) 

(m-r-fc)!(0 + n)( m )' 
If we let k and s±, . . . , s r vary, then this probability becomes 

m—r r n\f) 
\ -» \ "''■"(m—r—k) 

k=0 {siH \-s r =k} v ' v 'y m > 

m—r 

V- nil. a 1 u\ mW (m-r-k) 

= } C(k + r-l,k)-. \ 

I m—r q 

= 1 m ' \ i Y klc ( k + r-l, k)C(m - r, k) } m ' r - k > 
(m-r)! ^ > + n) (m) 

and since, as proven in Lemma 4.2, 

m—r 

yk\C(k + r- 1, k)C(m - r, k)6 {m _ r _ k) = (6 + r) (m _ r) , 

k=0 

this probability becomes 



(m-r)\(6 + n) (m) 

Finally, for any choice of r X's from {X\, . . . ,X n }, we have that 

ot i x r> ( s rn\{e + r) {m _ r) 

P{r\m,n) = C{n,r)- — — — , 

(ra-r)!(0 + n)( m ) 

which is given by 

n [r](0 + r )(m-r) 



{9 + n) {m) 

as required. □ 



-C(m,r), 
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5. Discussion. We have shown how to construct a particular Fleming- 
Viot process, for which the transition function is known, from basic ideas 
involving the Dirichlet process and Markov processes, based on the Gibbs 
sampler. This approach requires a new combinatorial result involving Polya- 
urn schemes. In particular, the combinatorial complexities which arise with 
the generator approach are avoided with the Chapman-Kolmogorov con- 
dition, once Theorem 3.1 has been established. Here we briefly discuss a 
number of points: 

5.1. The case 6 = 0. Here we consider the case when 6 = 0. It is evident 
that since n no longer exists in this case, there can be no stationary distri- 
bution for the process. A stationary distribution, which is H(9vq), can only 
exist when 6 > 0. When 6 = 0, the death process has probabilities 

oo 

dn(t) = (-l) m - n C(m,n)n (m „ 1) m!- 1 7m ,t, 

m=n 

for n > 2, where -y mt t = (2m — 1) exp{— m(m — l)t/2}, with do(t) = and 

oo 

d 1 (t) = l-£(-ir 7 (m,t). 

m=2 

Now d n (t) is the probability that there are n equivalence classes at time t 
in the coalescent of [9]. When 6 = 0, then 



P(Y 1 ,...,Y m \X 1 ,...,X n ,n) = J2[J2$x i 

and 



or I *\ n [r] r (m—r) „, > 

P[r\n,m) = G{m,r), 

n (m) 

which can be written as 

PM \ r< \r< >-l)!(m-l)! 

Firm, m) = rL [m, r)C[n, r) — — — , 

(n + m — 1)1 

n = 1, 2, . . . ; m = 1, 2, 

Hence, for n, m > 0, we have P(r = 0\n, m) = 0. 

5.2. The next step. We believe the representation given is informative, 
making a strong connection between Bayesian nonparametrics and popu- 
lation genetics. It is also based on first principles for the construction of 
a Markov process, namely, the proposal for a transition function and the 
verification of the Chapman-Kolmogorov condition. What are the possible 
directions in which this connection can be taken? The clear idea is that we 
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can consider alternative choices of IT which generates discrete random dis- 
tribution functions. One class of such a random distribution function can be 
generated via 

oo 
i=l 

where the Vi are independent and identically distributed from some measure 
vo and the pi have a stick-breaking structure; that is, 

p\ = w\ and pi = Wi -Wj), 

j<i 

where the Wj have independent beta distributions, say, beta(ay,/3j). Then 
p is almost surely a random probability measure when 

oo 

5^1og(l + atj/Pj) = 00; 
i=i 

see [7]. For example, the Dirichlet process arises when aij = 1 and (3j = 9. The 
two parameter Poisson-Dirichlet process, which is worth exploring, arises 
when OLj = 1 — a and [3j = 9 + ja for < a < 1 and 9 > —a. To find the 
transition function for this process and others, if they exist, we would need 
to replicate Theorem 3.1, that is, find the appropriate P(r\m, n) from the 
predictive distributions, and then solve 

00 

P(D t+s = r\D t = n) = P(r\n, m)d m (s) 

m=r 

for an appropriate death process. Hence, we have a strategy for finding 
alternative transition functions which seems to be highly possible to achieve. 
Work on this is ongoing. 

5.3. An inequality. Here we consider the usefulness of 

00 

For example, by putting n = 1, we have 

00 

, 9 + m 

771=1 

Hence, it is easy to obtain 

e- Al '<l-d (t)<(l + #)e- Alt 
and it is also clear how to obtain improved inequalities from this identity. 
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